Setup

library('SampleQC')
library('patchwork')
# define what to use and annotate
group_list  = metadata(qc_obj)$group_list
n_groups    = metadata(qc_obj)$n_groups

Checking for outliers and QC batches via MMD

Maximum mean discrepancy (MMD) is a measure of dissimilarity of empirical (possibly multivariate) distributions. If \(X\) and \(Y\) are sampled from distributions \(D_x\) and \(D_y\), then \(E(MMD(X,Y)) = 0\) if and only if \(D_x = D_y\). SampleQC uses MMD to estimate similarities between the QC matrices of samples in a experiment. Viewed as equivalent to a distance, SampleQC uses the MMD values as input to multiple non-linear embedding approaches, and for clustering. This then allows users to identify possible batch effects in the samples, and groupings of samples which have similar distributions of QC metrics.

Plot MMD dissimilarity matrix

Heatmap of all pairwise dissimilarities between samples (values close to 0 indicate similar samples; values of 1 and higher indicate extremely dissimilar samples).

(plot_mmd_heatmap(qc_obj))

Plot over UMAP embedding with annotations

UMAP embedding of dissimilarity matrix, annotated with selected discrete and continuous values for each sample.

plot_embeddings(qc_obj, "discrete", "UMAP")

group_id

true_group

N_cat

mito_cat

counts_cat

plot_embeddings(qc_obj, "continuous", "UMAP")

log_N

med_mito

med_counts

Plot over MDS embedding with annotations

Multidimensional scaling (MDS) embedding of dissimilarity matrix, annotated with selected discrete and continuous values for each sample.

plot_embeddings(qc_obj, "discrete", "MDS")

group_id

true_group

N_cat

mito_cat

counts_cat

plot_embeddings(qc_obj, "continuous", "MDS")

log_N

med_mito

med_counts

Plot SampleQC model fits and outliers over QC biaxials

These plots show biaxial distributions of each sample, annotated with both the fitted mean and covariance matrices, and the cells which are then identified as outliers. You can use this to check that you have the correct number of components for each sample grouping, and to check that the fitting procedure has worked properly. The means and covariances of the components should match up to the densest parts of the biaxial plots.

for (g in group_list) {
    cat('## ', g, '{.tabset}\n')
    # which samples?
    samples_g   = sort(colData(qc_obj)$sample_id[ colData(qc_obj)$group_id == g ])
    for (s in samples_g) {
        cat('### ', s, ' \n')
        g_fit   = plot_fit_over_biaxials_one_sample(qc_obj, s)
        g_out   = plot_outliers_one_sample(qc_obj, s)
        g       = g_fit / g_out
        print(g)
        cat('\n\n')
    }
}

SG1

sample31

sample32

sample33

sample34

sample35

sample36

sample37

sample38

sample39

sample40

sample41

sample42

sample43

sample44

sample45

sample46

sample47

sample48

sample49

sample51

sample52

sample53

SG2

sample50

sample54

sample55

sample56

sample57

sample58

sample59

sample60

sample61

sample62

sample63

sample64

SG3

sample10

sample11

sample12

sample13

sample14

sample15

sample19

sample20

sample21

sample22

sample24

SG4

sample16

sample17

sample18

sample23

sample25

sample26

sample27

sample28

sample29

sample30

SG5

sample01

sample02

sample03

sample04

sample05

sample06

sample07

sample08

sample09

Plot parameters

These plots show the fitted parameters for each sample and each mixture component. There are two sets of parameters: \(\alpha_j\), the mean shift for each sample; and \((\mu_k, \Sigma_k)\), the relative means and covariances for each mixture component.

\(\alpha_j\) values

Values of \(\alpha_j\) which are extreme relative to those for most other samples indicate samples which are either outliers in terms of their QC statistics, or have been allocated to the wrong sample grouping.

# plot likelihoods
for (g in group_list) {
    cat('### ', g, '\n')
    print(plot_alpha_js_likelihoods(qc_obj, g))
    cat('\n\n')
}

SG1

SG2

SG3

SG4

SG5

These plots show the same \(\alpha_j\) values, but as biaxials, and equivalently for PCA projections.

\(\alpha_j\) PCA values

for (g in group_list) {
    cat('### ', g, ' feats\n')
    print(plot_alpha_js(qc_obj, g, qc_idx=1:2, pc_idx=1:2))
    cat('\n\n')
    cat('### ', g, ' mito\n')
    print(plot_alpha_js(qc_obj, g, qc_idx=c(1,3), pc_idx=c(1,3)))
    cat('\n\n')
}

SG1 feats

SG1 mito

SG2 feats

SG2 mito

SG3 feats

SG3 mito

SG4 feats

SG4 mito

SG5 feats

SG5 mito

These plots show the composition of each sample in terms of the \(K\) different mixture components, plus outliers.

Splits across components

# plot likelihoods
for (g in group_list) {
    cat('### ', g, '\n')
    print(plot_beta_ks(qc_obj, g))
    cat('\n\n')
}

SG1

SG2

SG3

SG4

SG5